To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io
translated by 谷歌翻译
The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to target commercial robots.
translated by 谷歌翻译
Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels over time that refine/expand old classes. For example, humans learn to recognize ${\tt dog}$ before dog breeds. In practical settings, dataset $\textit{versioning}$ often introduces refinement to ontologies, such as autonomous vehicle benchmarks that refine a previous ${\tt vehicle}$ class into ${\tt school-bus}$ as autonomous operations expand to new cities. This paper formalizes a protocol for studying the problem of $\textit{Learning with Evolving Class Ontology}$ (LECO). LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e.g., dog breeds that refine the previous ${\tt dog}$). LECO explores such questions as whether to annotate new data or relabel the old, how to leverage coarse labels, and whether to finetune the previous TP's model or train from scratch. To answer these questions, we leverage insights from related problems such as class-incremental learning. We validate them under the LECO protocol through the lens of image classification (CIFAR and iNaturalist) and semantic segmentation (Mapillary). Our experiments lead to surprising conclusions; while the current status quo is to relabel existing datasets with new ontologies (such as COCO-to-LVIS or Mapillary1.2-to-2.0), LECO demonstrates that a far better strategy is to annotate $\textit{new}$ data with the new ontology. However, this produces an aggregate dataset with inconsistent old-vs-new labels, complicating learning. To address this challenge, we adopt methods from semi-supervised and partial-label learning. Such strategies can surprisingly be made near-optimal, approaching an "oracle" that learns on the aggregate dataset exhaustively labeled with the newest ontology.
translated by 谷歌翻译
对比方法导致了最近的自我监督表示学习(SSL)的表现激增。诸如BYOL或SIMSIAM之类的最新方法据称将这些对比方法提炼为它们的本质,消除了钟声和哨子,包括负面示例,这些示例不影响下游性能。这些“非对比度”方法的工作非常出色,而无需使用负面因素,即使全球最低限度的崩溃都在淡化。我们通过经验分析了这些非对抗性方法,发现Simsiam对数据集和模型大小非常敏感。特别是,如果模型相对于数据集大小而言太小,则SIMSIAM表示会经历部分维度崩溃。我们提出了一个度量标准来测量这种崩溃的程度,并表明它可以用于预测下游任务性能,而无需任何微调或标签。我们进一步分析建筑设计选择及其对下游性能的影响。最后,我们证明,转移到持续的学习设置充当正规化器并防止崩溃,并且在Imagenet上使用Resnet-18,连续和多上述训练之间的混合物可以提高线性探针精度多达18个百分点。
translated by 谷歌翻译
我们通过在野外观看人类来解决学习问题。尽管在现实世界中学习的传统方法和强化学习对于学习是有希望的,但它们要么是效率低下的样本,要么被限制在实验室环境中。同时,处理被动的,非结构化的人类数据已经取得了很大的成功。我们建议通过有效的一声机器人学习算法解决此问题,该算法围绕第三人称的角度学习。我们称我们的方法旋转:野生人类模仿机器人学习。旋转对人类演示者的意图提取先前,并使用它来初始化代理商的策略。我们介绍了一种有效的现实世界政策学习方案,该方案可以使用交互作用进行改进。我们的主要贡献是一种简单的基于抽样的策略优化方法,这是一种对齐人和机器人视频的新型目标功能,以及一种提高样本效率的探索方法。我们在现实世界中展示了单一的概括和成功,其中包括野外的20个不同的操纵任务。视频并在https://human2robot.github.io上进行交谈
translated by 谷歌翻译
腿部运动的最新进展使四足动物在具有挑战性的地形上行走。但是,两足机器人本质上更加不稳定,因此很难为其设计步行控制器。在这项工作中,我们利用了对机车控制的快速适应的最新进展,并将其扩展到双皮亚机器人。与现有作品类似,我们从基本策略开始,该策略在将适应模块的输入中作为输入作为输入。该外部媒介包含有关环境的信息,并使步行控制器能够快速在线适应。但是,外部估计器可能是不完善的,这可能导致基本政策的性能不佳,这预计是一个完美的估计器。在本文中,我们提出了A-RMA(Adapting RMA),该A-RMA(适应RMA)还通过使用无模型RL对其进行了鉴定,从而适应了不完美的外部外部估计器的基本策略。我们证明,A-RMA在仿真中胜过许多基于RL的基线控制器和基于模型的控制器,并显示了单个A-RMA策略的零拍摄部署,以使双皮德机器人Cassie能够在各种各样的现实世界中的不同场景超出了培训期间所见。 https://ashish-kmr.github.io/a-rma/的视频和结果
translated by 谷歌翻译
Current supervised visual detectors, though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric inductive bias, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation in Slot-TTA greatly improves instance segmentation in out-of-distribution scenes. We evaluate Slot-TTA in several 3D and 2D scene instance segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised test-time adaptation methods.
translated by 谷歌翻译
我们构建了一个系统,可以通过自己的手展示动作,使任何人都可以控制机器人手和手臂。机器人通过单个RGB摄像机观察人类操作员,并实时模仿其动作。人的手和机器人的手在形状,大小和关节结构上有所不同,并且从单个未校准的相机进行这种翻译是一个高度不受约束的问题。此外,重新定位的轨迹必须有效地在物理机器人上执行任务,这要求它们在时间上平稳且没有自我收集。我们的关键见解是,虽然配对的人类机器人对应数据的收集价格昂贵,但互联网包含大量丰富而多样的人类手视频的语料库。我们利用这些数据来训练一个理解人手并将人类视频流重新定位的系统训练到机器人手臂轨迹中,该轨迹是平稳,迅速,安全和语义上与指导演示的相似的系统。我们证明,它使以前未经训练的人能够在各种灵巧的操纵任务上进行机器人的态度。我们的低成本,无手套,无标记的远程遥控系统使机器人教学更容易访问,我们希望它可以帮助机器人学习在现实世界中自主行动。视频https://robotic-telekinesis.github.io/
translated by 谷歌翻译
机器人学习中流行的范式是为每个新机器人从头开始训练一项政策。这不仅效率低下,而且对于复杂的机器人而言通常不切实际。在这项工作中,我们考虑了将政策转移到具有显着不同参数(例如运动学和形态)的两个不同机器人中的问题。通过匹配动作或状态过渡分布(包括模仿学习方法)来训练新政策的现有方法,由于最佳动作和/或状态分布在不同的机器人中不匹配而失败。在本文中,我们提出了一种名为$ Revolver $的新方法,该方法使用连续进化模型用于物理模拟器中实现的机器人政策转移。我们通过找到机器人参数的连续进化变化,在源机器人和目标机器人之间进行了插值。源机器人的专家政策是通过逐渐发展为目标机器人的一系列中间机器人的训练来转移的。物理模拟器上的实验表明,所提出的连续进化模型可以有效地跨机器人转移策略,并在新机器人上实现卓越的样品效率。在稀疏的奖励环境中,提出的方法尤其有利,在稀疏奖励环境中,探索可以大大减少。代码在https://github.com/xingyul/revolver上发布。
translated by 谷歌翻译
持续学习(CL)被广泛认为是终身AI的关键挑战。但是,现有的CLENG分类,例如置换式和拆分式剪裁,利用人工时间变化,不与现实世界一致或不一致。在本文中,我们介绍了Clear,这是第一个连续的图像分类基准数据集,其在现实世界中具有自然的视觉概念的时间演变,它跨越了十年(2004-2014)。我们通过现有的大规模图像集(YFCC100M)清楚地清楚地通过一种新颖且可扩展的低成本方法来进行粘性语言数据集策划。我们的管道利用了预处理的视觉语言模型(例如剪辑)来互动地构建标记的数据集,这些数据集通过众包进一步验证以删除错误甚至不适当的图像(隐藏在原始YFCC100M中)。在先前的CLENMACK上,明确的主要优势是具有现实世界图像的视觉概念的平滑时间演变,包括每个时间段的高质量标记数据以及丰富的未标记样本,用于连续半惯用的学习。我们发现,一个简单的无监督预训练步骤已经可以提高只能利用完全监督数据的最新CL算法。我们的分析还表明,主流CL评估方案训练和测试IID数据人为膨胀CL系统的性能。为了解决这个问题,我们为CL提出了新颖的“流”协议,该协议始终在(近)未来测试。有趣的是,流媒体协议(a)可以简化数据集策划,因为当今的测试集可以重新用于明天的火车集,并且(b)可以生成更具概括性的模型,具有更准确的性能估算,因为每个时间段的所有标记数据都用于培训和培训,并且测试(与经典的IID火车测试拆分不同)。
translated by 谷歌翻译